Automated Data Extraction Using Predictive Program Synthesis

نویسندگان

  • Mohammad Raza
  • Sumit Gulwani
چکیده

In recent years there has been rising interest in the use of programming-by-example techniques to assist users in data manipulation tasks. Such techniques rely on an explicit inputoutput examples specification from the user to automatically synthesize programs. However, in a wide range of data extraction tasks it is easy for a human observer to predict the desired extraction by just observing the input data itself. Such predictive intelligence has not yet been explored in program synthesis research, and is what we address in this work. We describe a predictive program synthesis algorithm that infers programs in a general form of extraction DSLs (domain specific languages) given input-only examples. We describe concrete instantiations of such DSLs and the synthesis algorithm in the two practical application domains of text extraction and web extraction, and present an evaluation of our technique on a range of extraction tasks encountered in practice.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of the rule extraction method to evaluate seismicity of Iran

Assessing seismic hazards involves specifying the likelihood, magnitude and location of earthquakes in a region. Predicting the seismic hazards is the first step in reducing the impact of the damage caused by an earthquake.  In this study, to fully utilize all the known parameters which may possibly affect the occurrence of earthquakes (mb ≥ 4.5); a data-driven rule-extraction method called the...

متن کامل

Whatever Happened to Deductive Question Answering?

Deductive question answering, the extraction of answers to questions from machine-discovered proofs, is the poor cousin of program synthesis. It involves much of the same technology—theorem proving and answer extraction—but the bar is lower. Instead of constructing a general program to meet a given specification for any input—the program synthesis problem—we need only construct answers for spec...

متن کامل

Synthesizing Database Transactions

Database programming requires having the knowledge of database semantics both to maintain database integrity and to explore more optimization opportunities. Automated programming of database transactions is desirable and feasible. In general, transactions use simple constructs and algorithms; specifications of database semantics are available; and transactions perform small incremental updates ...

متن کامل

Toward completely automated vowel extraction: Introducing DARLA

Automatic Speech Recognition (ASR) is reaching further and further into everyday life with Apple’s Siri, Google voice search, automated telephone information systems, dictation devices, closed captioning, and other applications. Along with such advances in speech technology, sociolinguists have been considering new methods for alignment and vowel formant extraction, including techniques like th...

متن کامل

Lutetium-177 DOTATATE Production with an Automated Radiopharmaceutical Synthesis System

Objective(s): Peptide Receptor Radionuclide Therapy (PRRT) with yttrium-90 (90Y) and lutetium-177 (177Lu)-labelled SST analogues are now therapy option for patients who have failed to respond to conventional medical therapy. In-house production with automated PRRT synthesis systems have clear advantages over manual methods resulting in increasing use in hospital-based radiopharmacies. We report...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017